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1.  Introduction 


The  structural  information  of  man-made  targets  (ground-based  vehicles)  in  forward-looking 
infrared  (FLIR)  imagery  appears  different  from  that  of  their  surrounding  background.  Automatic 
target  detection  techniques  use  the  radiated  energy  from  a  man-made  target  in  FLIR  imagery  to 
detect  targets  (1-2).  Energy  distribution  patterns  on  the  targets  are  detennined  by  the  operational 
conditions  of  the  vehicles,  weather  conditions  and  solar  loading,  atmospheric  conditions,  and 
other  factors.  Similarly,  the  apparent  temperature  of  background  objects  is  determined  by 
environmental  conditions. 

The  research  work  in  infrared  imaging  system  and  automatic  target  detection/recognition 
(ATD/R)  were  started  approximately  at  the  same  time  nearly  40  years  ago;  see  the  review  paper 
by  Ratches  et  ah,  (3).  The  survey  papers  by  Bhanu  (4-5)  summarize  various  algorithms  for 
ATD/R  in  static  FLIR  images  that  were  developed  up  to  early  1990s;  these  algorithms 
predominately  use  traditional  image  processing  approaches  for  (optical)  picture  processing. 

Since  early  1990s,  combinations  (or  mixtures)  of  classical/traditional  image  processing  and 
emerging  techniques  have  been  proposed  for  the  FLIR  target  detection  problem.  For  instance, 
the  traditional  approach  of  segmenting  the  target  from  the  background  still  draws  attention  of  the 
FLIR  researchers.  Meanwhile,  various  emerging  techniques  are  also  being  studied  in 
segmentation  algorithm.  For  example,  Sang  et  ah,  (6)  use  Hopficld  neural  network  with  edge 
constraint,  and  Sun  et  ah,  (7)  exploit  fuzzy  thresholding  and  edge  detection  in  their  segmentation 
algorithms. 

There  have  also  been  studies  of  “matched  filtering”  algorithms  that  assume  a  specific 
shape/model  for  the  FLIR  signature  of  a  man-made  target.  For  instance,  Erinsse  et  ah,  (3)  use 
DOG  (difference  of  Gaussian)  filter  as  a  bandpass  filter  to  enhance  the  signature  of  man-made 
targets  in  FLIR  imagery,  and  use  the  resultant  infonnation  to  detect  (“pull”)  these  targets.  Park 
et  ah,  (9)  assume  that  a  man-made  target  exhibits  a  peak  response  when  a  Mexican  hat  filter  is 
applied  to  FLIR  imagery;  in  this  case,  the  Mexican  hat  filter  is  used  as  the  mother  wavelet  of  the 
multidimensional  wavelets.  The  filtered  image  is  thresholded  to  obtain  the  location  of  man-made 
targets.  Zhou  et  ah,  (10)  use  five  simple  matching  patterns  (shapes  of  rectangle,  square,  oval, 
rounded-rectangle,  and  circle)  for  all  targets.  They  applied  the  Gabor  function  (a  sinosoidal 
function  weighted  by  a  Gaussian  function)  to  both  their  matching  patterns  and  the  IR  image  to 
generate  Gabor  feature  vectors.  The  similarity  measure  between  the  Gabor  feature  vector  at  each 
image  point  and  that  of  the  matching  patterns  is  calculated.  The  similarity  values  are  then 
thresholded  to  obtain  the  locations  of  the  targets.  Weber  et  ah,  (11)  use  six  Gabor  function 
filters;  these  are  two  directional  RGFs  (real  Gabor  functions)  to  detect  object  height  and  width 
and  four  directional  IGFs  (imaginary  Gabor  functions)  to  detect  the  left,  right,  top  and  bottom 
edges  of  the  object.  The  input  image  is  correlated  with  each  of  the  six  Gabor  filters,  and  these 
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six  correlation  outputs  are  then  quadratically  combined;  the  final  detection  is  performed  using  a 
threshold. 

Neural  network  technologies  are  usually  used  as  a  classifier  for  an  automatic  target  recognition 
system;  see  Roth  (12)  and  Rizvi  et  ah,  (13)  for  a  survey.  Neural  networks  have  also  been 
exploited  to  classify  each  pixel  in  the  FLIR  imagery  as  either  man-made  structures  or 
background  (clutter)  in  order  to  detect  targets.  For  example,  see  the  articles  by  Dwan  et  al.,  (14) 
and  Ramanan  et  al.,  (15). 

Multiscale  fractal  method  is  another  emerging  technique  that  has  been  used  for  FLIR  target 
detection.  Xue  et  al.,  (16)  compute  fractal  dimensions  through  several  resolution  levels  of  an 
image.  The  K-mean  method  is  then  used  to  classify  the  target  and  background  based  on  the 
resultant  fractal  dimensions.  Shekarforoush  et  al.,  (17)  have  also  used  a  multi-fractal  formalism 
for  object  detection  and  tracking  in  FLIR  sequences. 

The  apparent  temperature  contrast  between  a  man-made  target  and  its  surrounding  is  a  key  factor 
in  most  FLIR  target  detection  algorithms.  A  target  with  high  contrast  (temperature)  is  relatively 
easy  to  detect;  this  property  is  exploited  by  most  of  the  above-mentioned  algorithms.  However, 
for  low  contrast  targets,  some  form  of  structural  information  needs  to  be  exploited.  In  most  of 
the  above-mentioned  algorithms,  the  image  characteristics  are  globally  analyzed  either  in  the 
filtering  operation  or  in  the  feature  extraction  process.  This  is  achieved  by  comparing  the  pixels 
representing  the  target  with  the  other  pixels  in  the  entire  image.  This  global-based  approach 
neglects  the  local  variations  between  the  signatures  of  the  target  and  its  surrounding  medium. 
Some  researchers  have  used  double-gated  filters  that  exploit  these  local  variations  for  target 
detection:  see  Gregoris  et  al.,  (18)  and  Der  et  al.,  (19).  In  these  methods,  the  authors  identify  an 
inner  target  window  surrounded  by  an  outer  window.  The  inner  window  carries  information  on 
the  FLIR  signature  properties  of  the  target  zone,  and  the  outer  window  is  used  to  identify  the 
signature  properties  of  the  nearby  background.  In  reference  (18),  a  filter  is  used  to  measure  the 
difference  in  the  mean  pixel  values  of  the  inner  target  window  and  the  outer  background 
window,  and  the  pixel  standard  deviation  in  the  background  window.  The  ratio  of  the  mean 
difference  to  the  standard  deviation,  that  is,  a  “normalized”  mean  difference,  is  compared  to  a 
pre-assigned  threshold.  The  inner  windows  with  ratios  exceeding  the  threshold  are  labeled  as 
possible  man-made  targets.  In  reference  (19),  more  comprehensive  features  (such  as  contrast 
difference,  gradient  strength,  straight  edge  information,  etc.)  are  computed  from  the  inner 
window  and  the  outer  window,  and  exploited  for  target  detection  purposes. 

In  this  paper,  the  local  similarity  measure  between  the  inner  and  outer  window  is  used  to  explore 
the  differences  between  structural  information  of  a  target  and  its  surrounding  scene  in  FLIR 
imagery.  Due  to  the  signature  variability  of  FLIR  images,  the  presence  of  individual  structural 
patterns,  such  as  edges,  shapes,  textures,  etc.,  cannot  be  reliably  predicted.  We  use  an 
eigenspace  analysis  to  represent  the  variations  of  the  structural  information  of  a  target  and  its 
background.  We  first  construct  the  inner  and  outer  image  vectors  that  represent  the  multi- 
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dimensional  signal  properties  of  the  FLIR  signatures  of  the  target  and  its  background, 
respectively.  While  the  inner  image  vector  is  solely  constructed  from  the  inner  window  (box), 
the  outer  image  vectors  are  constructed  from  a  more  extended  outer  window  that  consists  of 
many  overlapping  outside  boxes  that  are  of  the  same  size  as  the  inner  box.  These  two  image 
vectors  are  then  processed  by  two  directional  gradient  operators.  The  resulting  image  gradient 
vectors  are  mapped  by  two  transfonnations,  P\  and  Pi,  via  principle  component  analysis  (PCA) 
and  the  eigenspace  separation  transform  (EST)  (20),  respectively.  The  first  transformation  P\  is 
a  function  of  the  inner  image  vector.  The  second  transformation  Pi  is  a  function  of  both  the 
inner  and  outer  image  vectors.  The  target  detection  problem  is  formulated  as  a  statistical 
hypothesis  testing  problem.  That  is,  for  the  hypothesis  H\  (inner  image  vector  is  a  target):  the 
difference  between  the  components  of  P\  and  Pi  is  small.  For  hypothesis  Ho  (inner  image  vector 
is  clutter):  the  difference  between  these  two  functions  is  large. 

This  paper  is  organized  as  follows.  The  image  feature  set  is  described  in  section  2  using  image 
vector  construction  and  gradient  signals.  The  adaptive  target  detection  algorithm  is  discussed  in 
section  3,  where  EST  and  PCA  are  outlined,  and  the  hypotheses  and  target  detection  procedure 
are  presented  in  detail.  In  section  4,  results  of  testing  the  proposed  algorithm  on  two  large  FLIR 
image  databases  are  presented  using  ROC  (receiver  operation  characteristics)  curves.  The 
proposed  algorithm  is  also  compared  with  other  detection  algorithms.  Conclusions  are  given  in 
section  5. 


2.  FLIR  Imagery  Feature  Set 


In  this  section,  we  describe  how  the  image  vectors  are  constructed  to  represent  both  a  target 
region  and  it  surrounding  scene.  Directional  gradient  filters  are  used  to  preprocess  the  image 
vectors.  The  resulting  gradient  image  vectors  are  used  as  the  feature  set  in  our  target  detection 
algorithm. 

2,1  Image  Vectors 

For  a  given  pixel  in  the  image,  a  window  (called  inner  box)  centered  at  the  pixel  is  constructed  as 
shown  in  figure  1 .  The  size  of  the  inner  box  is  detennined  by  the  largest  target  size  in  the  target 
library,  which  is  based  on  a  known  or  estimated  range.  Similarly,  an  outer  window,  Xout,  as 
shown  in  figure  1 ,  is  constructed  by  a  bigger  window  that  surrounds  but  does  not  include  the 
inner  window.  The  size  of  the  outer  window  in  our  implementation  is  three  times  the  width  and 
height  of  the  inner  window.  The  outer  window  is  then  partitioned  into  N  small  overlapping  outer 
boxes  (shift  by  1  pixel  inx  or  y  direction)  which  have  the  same  size  as  the  inner  box.  The  inner 
image  vector  that  is  denoted  by  Xin  is  the  vectorized  image  data  of  the  inner  box.  It  is  written  as 

xtn  =lh,x2,---,xm\ 


3 


Where  m  =  wx  h,  w  and  h  are  the  width  and  height  of  the  inner  box,  and  x y  is  the  image  pixel 
value,  j  =  1,2 


Figure  1.  Image  vectors  construction. 

The  outer  window,  Xout,  which  contains  N  smaller  outer  boxes  represented  by  xouu  where  i  = 

1.2  Each  of  the  N  small  outer  boxes  is  the  vectorized  image  data  representing  each  of  N 
overlapping  boxes  that  have  the  same  dimension  m  as  the  inner  box.  It  is  written  as 

X out  —  [*„tf  5  Xout, 2  5  "  '  5  ^ out, N  ] 

xout,i  =[yiy2’---yj,  i  = 
where y,  is  the  image  pixel  value,  j  =  1,2 

2.2  Gradient  Image 

The  performance  of  many  image  processing  algorithms  is  improved  by  first  applying  gradient 
operators  to  enhance  the  edge  information  within  the  image.  Many  studies  of  the  human  visual 
system  provide  evidence  that  the  brain  extracts  edge  and  motion  information  early  in  the  visual 
processing  (21-22).  Our  experiments  likewise  show  that  the  proposed  algorithm  performs  much 
better  using  the  gradient  images  than  the  pixel  gray  levels  of  the  original  FLIR  imagery. 

Two  gradient  images  are  formed  by  passing  the  input  image  through  two  directional  high  pass 
filters.  Filter  F/,  is  designed  to  high  pass  filter  the  image  in  the  horizontal  direction  x.  Similarly, 
Fv  is  the  vertical  high  pass  filter  used  to  enhance  the  vertical  edges  in  they  direction.  The  two 
directional  high  pass  filters  are  defined  as  two  separable  differentiation  operations,  that  is, 

Fh=  l/2[-l  1];  Fv  =1/2  ~ 1  . 

Examples  of  gradient  images  are  shown  in  figure  2.  Note  that  the  gradient  (edge)  information 
about  the  target  (truck)  in  all  directions  is  preserved.  However,  the  edge  information  about  hot 
roads  are  suppressed  in  the  x  direction  but  emphasized  in  they  direction.  Usually  a  target 
contains  edge  information  in  all  directions.  The  background  scene  might  only  contain  edge 
information  in  a  particular  direction.  In  this  paper,  the  gradient  images  in  both  directions  are 
passed  to  the  detection  technique  where  a  detection  decision  is  made  by  using  the  detection  result 
from  both  of  the  input  gradient  images. 
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Figure  2.  Examples  of  gradient  images:  (a)  original  image,  (b)  gradient  image  obtained 
by  the  highpass  filtering  in  the  x  direction,  and  (c)  gradient  image  obtained  by 
the  highpass  filtering  in  the  y  direction. 


3.  Adaptive  Detection  of  Target  in  FUR  Imagery 


The  two  image  vectors  Xin  and  Xout  that  were  defined  above  are  mapped  into  two  transformations 
P\(Xin)  and  Pi(Xin,  Xout)  via  EST  and  PCA,  respectively.  Note  that  P\{Xin)  is  a  function  of Xin 
only,  while  Pi(Xin,  Xout)  is  a  function  of  Xin  and 

The  target  detection  problem  is  formulated  as  a  hypothesis  test.  For  hypothesis  H\.  target  being 
within  the  inner  box,  the  difference  between  P\(Xm)  and  Pi(Xin,  Xout)  should  be  small.  For 
hypothesis  Ho:  clutter  being  within  the  inner  box,  the  difference  between  these  two  functions 
should  be  large.  That  is: 

Hl,=\W,.)-Pi(Xl.,X0„f  >S  (1) 

n,  =\P,(x„)-p^x,„xml)\2  <8 

where  is  pre-selected  as  a  threshold.  The  two  functions  P\(Xm)  and  P2(Xin,  Xout)  that  are 
obtained  via  EST  and  PCA  are  discussed  in  the  following  subsection. 

3.1  EST  and  PCA 

The  EST  has  been  proposed  by  Torrieri  as  a  preprocessor  to  a  neural  binary  classifier  in  (20). 

The  EST  calculates  the  difference  covariance  matrix  C 
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where  Cout  and  Cm  are  the  covariance  matrices  of  the  outer  and  inner  image  vectors,  respectively. 
That  is, 


T 


c 


out 


N 


i=i 


r 


where  jUx  is  the  mean  value  of  either  x  =  Xin  or  x  =  Xout  i.  N  is  the  number  of  small  outer  boxes 
forming  the  outer  window.  The  eigenvalues  and  eigenvectors  of  difference  covariance  matrix  C 
are  calculated.  Some  of  these  eigenvectors  are  associated  with  the  positive  eigenvalues.  They 
are  referred  to  as  EST  positive  eigenvectors  Vest+.  The  eigenvectors  associated  with  the  negative 
eigenvalues  are  referred  to  as  EST  negative  eigenvectors  Vest-  Based  on  the  eigenvector 
properties,  all  EST  positive  and  negative  eigenvectors  are  orthogonal  to  each  other.  The  positive 
eigenvectors  Vest+  mainly  represent  the  outer  image  vector  subspaces.  Similarly,  the  negative 
eigenvectors  Vest-  represent  the  inner  image  vector  subspaces.  In  the  design  of  the  covariance 
matrix  in  this  paper,  only  one  inner  image  vector  forms  the  inside  covariance  matrix,  so  there  is 
only  one  non-zero  EST  negative  eigenvector. 


An  example  of  showing  EST  property  is  illustrated  in  figure  3,  where  two  target  chips  (truck  and 
tank)  are  considered  as  the  outer  and  inner  image  vectors,  respectively.  In  this  example,  the 
truck  image  is  considered  as  the  outer  image  vector  (however,  in  a  real  scenario,  it  would  be  a 
clutter  chip),  while  the  tank  image  is  considered  as  the  inner  image  vector.  The  original  images 
of  the  truck  and  tank  are  shown  in  figures  3a-b.  The  positive  eigenvector  Vest+  and  the  negative 
eigenvector  Vest-  are  shown  in  figures  3c-d.  From  figure  3,  we  can  observe  that  Vest+  contains 
truck  and  the  shadow  of  the  tank.  The  negative  eigenvector  Vest-  contains  mainly  tank  but  the 
shadow  of  the  truck  is  still  visible. 


Figure  3.  Examples  of  EST  and  PCA. 
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In  PCA  analysis,  the  eigenvectors  of  the  inside  covariance  matrix  Cm  and  outside  covariance 
matrix  Cout  are  calculated  separately.  We  call  the  eigenvectors  obtained  from  Cout  PCA  outside 
eigenvectors  Vpca,0ut-  Similarly,  the  eigenvectors  obtained  from  Cin  are  called  PCA  inside 
eigenvectors  Vpca,m •  PCA  inside  eigenvector  Vpca,in  is,  of  course,  not  expected  to  be  orthogonal  to 
PCA  outside  eigenvectors  VpcaiOUt.  Since  there  is  only  one  inner  image  vector  to  form  the  inside 
covariance  matrix,  there  is  only  one  non-zero  PCA  inside  eigenvector.  Therefore,  Vpca,m  is  the 
normalized  version  of  Xin,  that  is, 

V  .  =  BX. 

pea, in  r  in 


where  [3  is  the  square  root  of  the  energy  (nonn)  of  the  inner  image  vector,  which  is  the  same  as 
the  corresponding  eigen  value. 

3.2  Hypotheses 

A  visual  motivation  for  the  formulation  in  equation  1  is  illustrated  in  figure  4.  For  simplicity, 
assume  that  the  outer  window  consists  of  only  two  vectors,  Xoup  |  and  Xou,^.  The  inner  image 
vector  is  Xin,  which  is  aligned  in  the  same  direction  as  the  PCA  inside  eigenvector  Vpca,n.  After 
the  EST  procedure,  the  positive  and  negative  eigenvectors  Vest+i,  Vest+2,  and  Vest-  are  aligned 
approximately  with  the  inner  and  outer  image  vectors,  respectively.  In  addition,  all  positive  and 
negative  eigenvectors  are  orthogonal  to  each  other. 


Hypothesis  :  target  Hipside 

Hypothesis  :  clutteiH  ^ 

X.  (V  ) 

^  in  pea, in 

X.  (V  .  ) 

^  in  pea, in 

x 

out ,  2 

Vest-  v 

▲  Vest ,+2 

T 1  4 

X 

out ,  2 

TT  ^ 

Vest-  Vest ,  +2 

,,  X 

Vest, -fl011*1'1 

Vest,^Put'^ 

(a) 

(b) 

Figure  4.  Visualization  of  hypotheses. 

Consider  the  EST  properties  as  demonstrated  by  the  results  in  figure  3.  The  negative  EST 
eigenvector  predominately  exhibits  features  that  originally  appear  in  the  inner  window  (the  tank) 
and  the  positive  EST  eigenvector  contains  the  features  that  appear  mainly  in  the  outer  window 
(the  truck).  Thus,  the  negative  and  positive  EST  eigenvectors  approximately  align  with  the  inner 
and  outer  windows,  respectively.  We  use  these  properties  for  our  hypothesis  testing  problem  in 
the  following  fashion.  For  hypothesis  H0,  where  clutter  is  within  the  inner  box,  the  re-alignment 
of  EST  eigenvectors  with  respect  to  the  PCA  eigenvectors  causes  the  negative  EST  eigenvector 
Vest-  to  be  distant  from  the  PCA  inside  eigenvector  Vpcapn.  For  hypothesis  II\,  where  the  target  is 
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within  the  inner  box,  the  EST  negative  eigenvector  Vest-  is  more  closely  aligned  to  the  PCA 
inside  eigenvectors  Vpca,in- 

When  there  is  clutter  within  the  inner  box  that  is  somewhat  similar  to  its  surroundings,  the 
difference  between  the  inner  and  outer  image  vectors  is  not  large.  The  EST  procedure  mixes  the 
inner  and  outer  image  vectors  and  generates  the  positive  and  negative  eigenvectors.  The  EST 
positive  eigenvectors  present  mainly  the  subspace  corresponding  to  the  outer  image  vectors, 
which  are  also  similar  to  the  inner  clutter  image  vector.  Because  the  EST  negative  eigenvector 
has  to  be  orthogonal  with  the  EST  positive  eigenvectors,  the  EST  negative  eigenvector  is  more 
distant  from  the  inner  image  vector.  This  causes  the  residual  energy  between  the  EST  negative 
eigenvector  Vest-  and  the  PCA  inside  eigenvector  Vpcajn  to  be  large.  On  the  other  hand,  when 
there  is  a  target  in  the  inner  box,  there  is  a  larger  difference  between  the  inner  and  outer  image 
vectors.  The  EST  negative  eigenvector  is  generated  to  be  orthogonal  with  the  EST  positive 
eigenvectors,  which  are  already  very  different  from  the  inner  image  vector.  The  EST  negative 
eigenvector  is  then  closer  to  the  inner  image  vector  that  is  proportional  to  the  PCA  inside 
eigenvector.  Therefore,  the  residual  energy  between  the  EST  negative  eigenvector  Vest-  and  the 
PCA  inside  eigenvector  Vpca:in  is  small. 


A  common  method  of  measuring  the  difference  between  two  sets  of  vectors  is  to  project 
elements  of  one  set  upon  the  other.  One  way  we  could  accomplish  this  is  by  projecting  the  inner 
image  vector  into  the  subspaces  that  are  generated  by  the  EST  positive  or  PCA  outside 
eigenvectors.  The  relative  error  energy  is  calculated  as  follows, 


E  = 


X.„  -X, 


X, 


where  Xin+  is  the  reconstruction  of  Xin  using  the  subspace  generated  by  the  EST  positive 

eigenvectors.  However,  this  relative  error  energy  is  close  to  1  for  both  target  and  clutter  inside 
the  inner  box.  This  is  because  that  the  inner  image  vector  X„  is  almost  orthogonal  to  the  positive 
eigenvectors.  Therefore,  this  is  not  an  effective  measurement.  We  use  the  following  hypothesis 
test  instead: 


a.  Under  the  null  hypothesis  (clutter  inside),  mixing  of  outer  and  inner  image  vectors,  Cout  and 
Cin,  via  EST  results  in  a  measurable  change  in  “alignment”  of  EST  positive  and  negative 
eigenvectors  with  respect  to  PCA  inside  and  outside  eigenvectors.  That  is, 

>S 

b.  Under  the  alternative  hypothesis  (target  inside),  mixing  of  outer  and  inner  image  vectors, 
C0ut  and  Cin,  via  EST  does  not  alter  the  “alignment”  of  EST  positive  and  negative 
eigenvectors  with  respect  to  PCA  inside  and  outside  eigenvectors.  That  is, 
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H,  =K..,„-K„-\!  <s 

Comparing  the  PCA  outside  eigenvector  and  EST  positive  eigenvector  could  also  be  used  for 
exploring  the  differences  between  inner  and  outer  image  vectors.  However,  this  analysis  is  in  a 
multi-dimensional  space.  In  the  following  discussion  of  detection  criterion,  we  use  only  the  PCA 
inside  eigenvector  and  EST  negative  eigenvector. 

3.3  Target  Detection  Procedure 

The  overall  procedure  of  the  target  detection  algorithm  is  outlined  in  the  following  (as  shown  in 
figure  5.) 


1 .  The  input  FLIR  image  is  passed  through  the  gradient  operators  as  mentioned  above,  to 
obtain  two  gradient  images  lx  and  ly,  which  represent  the  gradient  images  in  the  x  and  y 
directions,  respectively. 


2.  For  each  pixel  of  the  gradient  images,  the  inner  window  and  outer  window  are  formed  to 
obtain  the  inner  and  outer  image  vectors,  Xit ,  and  Xout. 


3.  Two  transformations  P\(Xin)  and  Pi{Xin,  Xout)  are  calculated  via  PCA  and  EST,  where 
P\(XW)  represents  the  PCA  positive  eigenvector  Vpca,in  and  PiiXm,  Xout)  represents  the  EST 
negative  eigenvector  Vest-.  That  is, 


P^Xin)  =  VpcaM 
P2(Xin,Xout)  =  Vest_ 


4.  The  residual  energy  Evi  is  obtained.  That  is, 


E  =  V  -V 

vi  pea, in  est—  ’ 


if  V  *V  >0 

y  '  pea, in  r  est-  ^  u 


E  =\V  +  V  if  V  *V  <  0 

VI  V  pea, in  '  est—  J  V  '  pea,m  '  est-  w 


9 


If  Vpca.in  and  Vest-  are  the  same  sign,  the  residual  energy  is  the  mean  square  difference  of 
these  two  vectors.  If  Vpca,in  and  Vest-  are  different  signs,  the  residual  energy  is  the  mean 
square  sum. 

5.  For  each  pixel,  two  values  of  residual  energy,  EviyX  and  Evi  y  are  calculated,  corresponding  to 
the  x  and  y  directions.  The  minimum  of  these  two  residual  energies  is  retained. 

£m,„  =  min  (Evi,x,Eviy) 

6.  If  this  minimum  Emm  is  above  a  threshold  8,  the  pixel  is  declared  clutter.  If  this  minimum 
is  below  8,  the  pixel  is  declared  as  target. 


4.  Results  and  Discussion 


4.1  Image  Database 

We  used  two  image  databases  yuma9207_roi  and  huli9204_roi  from  Comanche  database  to  test 
the  proposed  target  detection  algorithm.  Yuma9207_roi  is  the  more  difficult  database  because 
the  images  were  taken  in  the  summer  in  the  Arizona  desert  and  the  background  contains  many 
high  temperature  spots.  Huli9204_roi  is  an  easier  database.  The  images  were  taken  in  the  spring 
in  central  California  and  the  background  is  cool  compared  with  most  of  the  targets  in  the 
database.  Figure  6  shows  image  examples  from  the  two  databases. 
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Figure  6.  Examples  of FLIR  images:  (a)  an  image  from  the  database  yuma9207_roi,  where  the 
background  contains  many  high  temperature  spots  and  (b)  an  image  from  the  database 
huli9204_roi,  where  the  background  is  cool  compared  with  the  targets  in  the  image. 


The  proposed  algorithm  is  based  on  exploring  the  structural  differences  between  a  target  and  its 
surroundings.  For  targets  in  a  close  range,  the  structural  features  of  the  targets  are  prominent 
within  the  image.  For  targets  in  a  longer  range  image,  the  target  appears  very  small  and  the 
structural  information  is  not  obvious.  In  order  to  test  the  algorithm  performance,  the  algorithm  is 
run  at  the  different  ranges.  We  broke  the  ranges  down  into  three  categories,  short,  medium,  and 
long.  Table  1  shows  the  information  about  these  two  databases.  In  the  database,  some  images 
contain  targets  and  some  do  not.  When  ROC  curves  are  calculated  for  these  databases,  a  roughly 
equal  number  of  images,  with  and  without  targets,  are  randomly  selected.  This  makes  the 
experiment  less  biased. 

Table  1.  Database. 


Total 
number 
of  image 

Number 
of  image 
without 
targets 

Number 
of  image 
with 
targets 

Range:  number  of  images/ 
number  of  targets 

yuma9207  roi 

2644 

839 

1805 

Short  range:  804/804 

Medium  range:  430/640 

Long  range:  571/1010 

huli9204_roi 

1910 

686 

1224 

Short  range:  971/974 

Medium  range:  253/403 
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Table  1  shows  the  details  of  the  image  databases.  Images  are  stored  in  10  bits.  For  the  image 
database  yuma9207_roi,  the  total  number  of  images  is  2644.  Among  these,  the  number  of 
images  without  target  is  839  and  the  number  of  images  with  target  is  1805.  For  the  images  with 
targets,  the  numbers  of  images  in  short,  medium,  and  long  ranges  are  803,  430,  and  571, 
respectively.  For  these  three  ranges,  numbers  of  targets  are  804,  640,  and  1010,  respectively, 
since  some  images  have  more  than  one  target.  For  the  image  database  huli9204_roi,  the  total 
number  of  images  is  1910,  where  686  images  have  no  targets  and  1224  images  have  targets.  For 
the  images  with  targets,  there  are  97 1  images  in  short  range,  where  974  targets  are  found,  and 
there  are  253  images  in  medium  range,  where  403  targets  are  found. 

4.2  ROC  Results 

Ideally,  the  proposed  algorithm  should  calculate  the  residual  energy  at  all  pixels  in  the  image. 
However,  to  reduce  the  computation  time,  the  algorithm  only  calculates  the  residual  energy  for 
points  that  are  separated  in  distance  by  a  quarter  of  the  size  of  the  inner  window.  As  we  show  in 
the  following,  this  would  not  significantly  affect  the  performance  of  the  proposed  algorithm. 

In  figure  7a,  the  solid  line  window  corresponds  to  an  inner  window  that  is  centered  at  a  target 
coordinates.  The  bold  dashed  line  window  corresponds  to  one  of  the  inner  windows  that  our 
algorithm  selects.  Consider  the  ratio  of  the  overlapping  area  of  these  two  windows  to  the  area  of 
a  single  window;  we  call  this  ratio  the  overlapping  area  ratio.  When  this  overlapping  area  ratio 
is  1 ,  the  calculating  point  is  at  the  center  of  the  target.  When  this  overlapping  area  ratio  is  less 
than  1,  the  calculating  point  is  away  from  the  center  of  the  target.  Figures  7b-d  show  the 
nonnalized  histograms  of  the  overlapping  area  ratios  for  the  images  at  short,  medium,  and  long 
ranges,  respectively,  of  yuma9207_roi  database.  From  figures  7b-d,  about  90%  of  the  images 
have  the  overlapping  area  ratios  above  0.70  at  all  ranges.  The  huli9207_roi  database  illustrates 
the  similar  distribution  of  the  overlapping  area  ratios. 
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overlapping  area  ratio 


Figure  7.  (a)  Overlapping  area  ratio.  The  solid  line  window  corresponds  to  an  inner  window  that  is  centered  at  a 
target  coordinates.  The  bold  dashed  line  window  corresponds  to  one  of  the  inner  windows  that  the 
algorithm  selects.  Overlapping  area  ratio  is  defined  as  the  ratio  of  the  overlapping  area  of  these  two 
windows  to  the  area  of  a  single  window,  (b)  normalized  histogram  of  the  overlapping  area  ratios  for  short 
ranges,  (c)  normalized  histogram  of  the  overlapping  area  ratios  for  medium  ranges,  and  (d)  normalized 
histogram  of  the  overlapping  area  ratios  for  long  ranges. 

ROC  results  are  shown  in  figure  8.  For  the  huli9204  database,  as  is  shown  in  figure  8a,  the  two 
curves  are  for  the  short  and  medium  range  targets.  The  ROC  results  are  very  similar.  For  a 
threshold  corresponding  to  1  false  alarm  per  frame  (per  image),  the  detection  rate  is  about  81%. 
If  the  false  alarm  is  increased  to  20  per  frame,  the  detection  rate  reaches  96%. 


13 


Figure  8.  ROC  results  from  the  proposed  implementation:  (a)  database  huli9204_roi  and  (b)  database 
yuma9207_roi. 

For  the  yuma9207  database,  as  is  shown  in  figure  8b,  the  ROC  results  are  plotted  for  short, 
medium,  and  long  ranges.  The  performances  of  these  three  range  bins  are  quite  different.  As 
discussed  above,  for  long  ranges,  the  target  appears  very  small  in  the  image.  In  this  case,  there  is 
not  enough  structural  information  available  in  the  target  chip.  At  1  false  alarm  per  frame,  the 
detection  rate  is  about  23%.  When  20  false  alarms  are  allowed  per  frame,  the  detection  rate  is 
about  46%.  For  medium  ranges,  the  performance  is  improved  to  a  detection  rate  of  40%  for  1 
false  alarm  per  frame  and  71%  for  20  false  alarms  per  frame.  The  best  performance  is  achieved 
for  short  ranges.  The  detection  rate  is  58  %  at  1  false  alann  per  frame  and  84%  at  20  false 
alarms  per  frame. 

4.3  Discussion 

As  we  mentioned  early,  to  reduce  the  computation  time,  the  proposed  implementation  only 
calculates  the  residual  energy  for  points  that  are  separated  in  distance  by  a  quarter  of  size  of  the 
inner  window.  In  order  to  assess  the  performance  of  this  implementation,  we  compare  this  result 
with  an  ideal  situation.  In  this  ideal  situation,  the  algorithm  calculates  the  residual  energy  for 
points  that  are  separated  in  distance  with  the  same  size  as  the  inner  window.  The  algorithm  starts 
by  replacing  the  inner  window  at  the  center  of  the  target,  which  is  obtained  from  the  ground  truth 
information.  Subsequent  inner  windows  are  placed  so  that  the  inner  windows  are  adjacent,  non¬ 
overlapping,  and  cover  as  much  of  the  image  as  is  possible  consistent  with  both  inner  and  outer 
windows  being  completely  contained  within  the  image. 

Figure  9  shows  the  ROC  results  for  the  proposed  implementation  and  the  ideal  situation  for  the 
database  yuma9207_roi.  Figure  9a  shows  the  results  for  0  to  25  false  alarms  per  frame.  Figure 
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9b  shows  the  enlarged  portion  of  the  false  alarm  rates  between  0  to  5.  The  solid  lines  are  the 
ROC  results  for  the  ideal  situation.  The  dash  lines  are  the  ROC  results  for  the  proposed 
implementation  that  is  obtained  in  figure  8b.  The  upper  lines  in  the  figure  are  ROC  results  for 
the  short  ranges,  the  middle  lines  are  for  the  medium  ranges,  and  lower  lines  are  for  the  long 
ranges.  The  overall  results  show  that  similar  results  are  obtained  from  the  proposed 
implementation  and  the  ideal  situation.  For  the  long  ranges,  the  detection  rate  of  the  proposed 
implementation  is  23%  versus  25%  of  the  ideal  situation  at  1  false  alarm  per  frame,  and  46% 
versus  48%  at  20  false  alarms  per  frame.  For  the  medium  ranges,  at  1  false  alarm  rate,  the 
detection  rates  of  the  proposed  implementation  and  the  ideal  situation  are  40%  and  49%, 
respectively;  at  20  false  alarms  per  frame,  71%  and  72%,  respectively.  For  the  short  ranges,  at  1 
false  alarm  per  frame,  the  detection  rates  of  the  proposed  implementation  and  the  ideal  situation 
are  58%  and  63%,  respectively;  at  20  false  alarms  per  frame,  84%  and  85%,  respectively. 


Figure  9.  ROC  results  of  the  proposed  implementation  comparing  with  the  ideal  situation  for  the  database 
yuma9207_roi:  (a)  ROC  results  of  false  alarm  rates  between  0  and  25;  (b)  ROC  results  of  false 
alarm  rates  between  0  and  5. 

This  result  demonstrates  that  the  algorithm  performance  is  not  significantly  changed  when  the 
center  of  the  inner  window  is  not  exactly  located  at  the  center  of  the  target.  However,  the 
measure  of  closeness  of  the  center  of  the  inner  window  and  the  center  of  the  target  is  required  to 
be  within  a  certain  degree.  For  example,  the  overlapping  area  ratio  that  was  defined  above  needs 
to  be  over  0.70  as  shown  here. 

The  proposed  algorithm  is  also  compared  with  another  target  detection  algorithm,  called  spatial 
anomaly  detection  algorithm  (SAD A)  which  was  described  in  [79],  for  the  database  yuma9207. 
The  ROC  curves  are  shown  in  figures  lOa-b.  Figure  10a  shows  the  results  for  0  to  25  false 
alarms  per  frame.  Figure  10b  shows  the  enlarged  portion  of  the  false  alarm  rates  between  0  and 
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5.  The  solid  lines  represent  the  results  for  the  SADA  algorithm  and  the  dash  lines  are  for 
proposed  algorithm.  The  upper  lines  in  the  figure  are  ROC  results  for  the  short  ranges,  the 
middle  lines  are  for  the  medium  ranges,  and  the  lower  lines  are  for  the  long  ranges.  The  overall 
results  show  that  the  proposed  algorithm  has  a  higher  detection  perfonnance  than  the  SADA  at 
low  false  alarm  rates.  At  higher  false  alarm  rates,  the  SADA  gives  slightly  higher  detection 
rates.  See  figure  10b;  at  short  ranges,  the  detection  rates  of  the  proposed  algorithm  and  the 
SADA  are  similar.  For  medium  ranges,  at  1  false  alarm  per  frame,  the  detection  rate  of  the 
proposed  algorithm  is  40%  from  the  29%  obtained  by  the  SADA.  For  long  ranges,  at  1  false 
alarm  per  frame,  the  detection  rate  of  the  proposed  algorithm  is  increased  to  23%  from  18% 
obtained  by  the  SADA. 


Figure  10.  ROC  results  of  the  proposed  algorithm  comparing  with  the  SADA  for  the  database  yuma9207_roi: 
(a)  ROC  results  of  false  alarm  rates  between  0  and  25  and  (b)  ROC  results  of  false  alarm  rates 
between  0  and  5. 

The  results  of  testing  on  each  point  for  two  images  (both  containing  one  target)  showed  that  the 
hit  points  are  concentrated  in  the  target  area,  suggesting  that  the  performance  of  the  algorithm 
does  not  deteriorate  significantly  by  not  testing  on  every  image  point.  Detection  results  obtained 
by  testing  each  point  on  one  image  are  shown  in  figures  1  la-c.  Figure  11a  shows  the  gradient 
image  obtained  by  highpass  filtering  the  FLIR  image  in  the  x  direction.  The  values  of  the 
residual  energy  Emin  are  displayed  as  the  image  gray-level  values  in  figure  lib.  The  image  is 
displayed  in  an  inverse  mode  such  that  the  lower  Emin  values  appear  brighter.  All  brighter  areas 
appear  in  the  target  area  and  the  transition  areas  between  vegetation  and  ground,  where  strong 
structural  information  exists.  In  figure  1  lc,  the  points  with  Emm  values  that  are  less  than  the 
threshold  (0.3  in  this  case)  are  superimposed  into  the  gradient  image.  All  hit  points  are  within 
the  target  area. 
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Figure  11.  Detection  results:  (a)  gradient  image  obtained  by  the  highpass  fdtering  in  the  x 
direction,  (b)  residual  energy  Emm  values,  and  (c)  the  points  with  Eum)  <  0.3  are 
superimposed  into  the  gradient  image  in  (a)  as  markers  (+). 


As  discussed  earlier,  the  range  between  a  target  and  the  sensor  changes  the  target  structural 
information  that  appears  in  FLIR  images.  The  proposed  algorithm  is  designed  to  detect  targets 
that  appear  at  short  and  medium  ranges  within  the  FLIR  imagery.  For  targets  at  long  ranges 
(where  targets  appear  as  “blob”  if  it  is  hot),  a  different  algorithm  (such  as  an  energy  detector), 
that  does  not  rely  on  a  detailed  structural  information,  is  needed. 


5.  Conclusions 


An  adaptive  target  detection  algorithm  based  on  exploiting  the  structural  information  within  a 
target  and  its  background  is  presented.  The  algorithm  calculates  the  inner  and  outer  image 
gradient  vectors  representing  the  target  and  its  background,  respectively.  The  EST  and  PCA 
transforms  are  locally  generated  from  the  inner  and  outer  image  gradient  vectors.  The 
differences  between  the  EST  and  PCA  eigenvectors  are  used  to  distinguish  the  target  from  its 
background.  Results  of  testing  the  proposed  algorithm  on  two  large  FLIR  databases  were 
presented. 
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